High-Performance Computing Techniques for Record Linkage

نویسندگان

  • Peter Christen
  • Markus Hegland
  • Stephen Roberts
  • Ole M. Nielsen
چکیده

The task of linking together information from one or more data sources representing the same entity (patient, customer, provider, business, etc.) If no unique identifier is available, probabilistic linkage techniques have to be applied Applications of record linkage Remove duplicates in a data set (internal linkage) Merge new records into a larger master data set Create patient oriented statistics Compile data for longitudinal studies Clean data sets for data mining projects or mailing lists

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Computing Techniques for High-Performance Probabilistic Record Linkage

Record linkage techniques are used to link together records from one or more data sets relating to the same entity, e.g. patient or customer. As data is often not primarily collected for data analysis purposes, a common unique identifi er is missing in many cases, and probabilistic linkage techniques have to be applied. Historical collections of administrative and other (health) data nowadays c...

متن کامل

Data Replication-Based Scheduling in Cloud Computing Environment

Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...

متن کامل

Scaling Private Record Linkage using Output Constrained Differential Privacy

Many scenarios require computing the join of databases held by two or more parties that do not trust one another. Private record linkage is a cryptographic tool that allows such a join to be computed without leaking any information about records that do not participate in the join output. However, such strong security comes with a cost: except for exact equi-joins, these techniques have a high ...

متن کامل

Utilization of Soft Computing for Evaluating the Performance of Stone Sawing Machines, Iranian Quarries

The escalating construction industry has led to a drastic increase in the dimension stone demand in the construction, mining and industry sectors. Assessment and investigation of mining projects and stone processing plants such as sawing machines is necessary to manage and respond to the sawing performance; hence, the soft computing techniques were considered as a challenging task due to stocha...

متن کامل

Scaling Record Linkage to Non-uniform Distributed Class Sizes

Record linkage is a central task when information from different sources is integrated. Record linkage models use so-called blockers for reducing the search space by discarding obviously different record pairs. In practice, important problems have Zipf distributed class sizes with some large classes where blocking is not applicable any more. Therefore we propose two novel meta algorithms for sc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002